Terrorism is the unlawful use of force or violence against persons or property to intimidate or coerce a government, the civilian population, or any segment thereof, in furtherance of political or social objectives. It targets ethnic or religious groups, governments and political parties, corporations, and media enterprises. Terrorism that occurs throughout the world is known as global terrorism. It is probably the worst type of crime that ever exists. Not only does it kill people, it destroys livelihoods, economies, and civilized world order that took millennia to form. The results of terrorism are almost always catastrophic. Individuals or groups that commit these crimes are called terrorists. Terrorists exist all over the world. There are a few that operate alone, but mostly they are parts of one of many global organizations.
By this project, I want to draw various inferences about the worst hit countries by terrorism over the past years. And which terrorist organisation have caused more damage over the years
The Global Terrorism Database (GTD) is the most comprehensive unclassified database of terrorist attacks in the world. The National Consortium for the Study of Terrorism and Responses to Terrorism (START) makes the GTD available via this site in an effort to improve understanding of terrorist violence, so that it can be more readily studied and defeated. The GTD is produced by a dedicated team of researchers and technical staff.
The GTD is an open-source database, which provides information on domestic and international terrorist attacks around the world since 1970, and now includes more than 200,000 events. For each event, a wide range of information is available, including the date and location of the incident, the weapons used, nature of the target, the number of casualties, and – when identifiable – the group or individual responsible. Link of the dataset: https://www.start.umd.edu/gtd/access/
Review 1: Select the dataset, Dataset cleaning and preprocessing, perform Data Visualisation using different types of graphs.
Review 2: Use the latitude and longitude variables and perform data visualisation using maps in R studio.
Review 3: Use Tableau for data visualisation and perform required documentation.
library(tidyverse)
library(data.table)
library(lubridate)
library(RColorBrewer)
library(gridExtra)
library(plotly)
library(ggthemes)
library(wesanderson)
library(leaflet)
library(VIM)
dt <- as.tibble(fread("globalterrorismdb_0718dist.csv",
na.strings = c("", "NA")))
There are 135 variables in the original data. We’ll select variables that are relatively easy to interpret and have less missing values: year, month, location, number of kill, ransom, suicide…
gbtr <- select(dt, c(1,2,3,4,9,11,12,13,14,15,18,27,28,59,99,113,117))
gbtr$imonth[gbtr$imonth==0] <- NA
gbtr$iday[gbtr$iday==0] <- NA
gbtr2k <- gbtr %>% filter(iyear>=2000)
gbtr2k$imonth[gbtr2k$imonth==0] <- NA
gbtr2k$iday[gbtr2k$iday==0] <- NA
glimpse(gbtr)
## Rows: 181,691
## Columns: 17
## $ eventid <int64> 9.733093e-313, 9.733093e-313, 9.733143e-313, 9.733143...
## $ iyear <int> 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1...
## $ imonth <int> 7, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ iday <int> 2, NA, NA, NA, NA, 1, 2, 2, 2, 3, 1, 6, 8, 9, 9, 10, 11...
## $ country_txt <chr> "Dominican Republic", "Mexico", "Philippines", "Greece"...
## $ region_txt <chr> "Central America & Caribbean", "North America", "Southe...
## $ provstate <chr> NA, "Federal", "Tarlac", "Attica", "Fukouka", "Illinois...
## $ city <chr> "Santo Domingo", "Mexico city", "Unknown", "Athens", "F...
## $ latitude <dbl> 18.45679, 19.37189, 15.47860, 37.99749, 33.58041, 37.00...
## $ longitude <dbl> -69.95116, -99.08662, 120.59974, 23.76273, 130.39636, -...
## $ location <chr> NA, NA, NA, NA, NA, NA, NA, "Edes Substation", NA, NA, ...
## $ success <int> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1...
## $ suicide <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ gname <chr> "MANO-D", "23rd of September Communist League", "Unknow...
## $ nkill <int> 1, 0, 1, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 1, 0...
## $ nhours <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ ransom <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
head(gbtr)
## # A tibble: 6 x 17
## eventid iyear imonth iday country_txt region_txt provstate city latitude
## <int64> <int> <int> <int> <chr> <chr> <chr> <chr> <dbl>
## 1 9.7330~ 1970 7 2 Dominican ~ Central A~ <NA> Sant~ 18.5
## 2 9.7330~ 1970 NA NA Mexico North Ame~ Federal Mexi~ 19.4
## 3 9.7331~ 1970 1 NA Philippines Southeast~ Tarlac Unkn~ 15.5
## 4 9.7331~ 1970 1 NA Greece Western E~ Attica Athe~ 38.0
## 5 9.7331~ 1970 1 NA Japan East Asia Fukouka Fuko~ 33.6
## 6 9.7331~ 1970 1 1 United Sta~ North Ame~ Illinois Cairo 37.0
## # ... with 8 more variables: longitude <dbl>, location <chr>, success <int>,
## # suicide <int>, gname <chr>, nkill <int>, nhours <dbl>, ransom <int>
matrixplot(gbtr, sortby = c("nkill"))
aggr(gbtr, labels=names(gbtr),cex.axis = .9)
Variables such as location, nhours, and ransom has large number of missing values. EDA with thses variables will be avoided.
p <- gbtr %>% mutate(iyear=as.factor(iyear)) %>%
group_by(iyear) %>% count() %>%
ggplot(aes(x=iyear,y=n,group=1)) +
geom_line(size=1, color="brown")+
geom_point(color="brown") +
scale_x_discrete(
breaks=c("1970", "2000","2008", "2011", "2014","2017")
) +
labs(title = "Event by year", x = "year", y = "count")+
theme_economist()
p
There is a rapid increase in terrorist event since year 2000. We’ll seperately observe the trend by the region.
p4 <- gbtr %>% count(region_txt, iyear) %>%
ggplot(aes(iyear, n,color=region_txt)) +
geom_line(aes(group=region_txt)) +
labs(title = "Trend by Region", x="year", y="count", color="region")+
theme_light()
ggplotly(p4)
Hovering over the plot to see region label Middle East & North Africa and South Asia are the regions mainly responsible for the spike in data.
Since there is a steep upward trend since aproximately year 2000, we’ll inspect the period before and after 2000 seperately.
p2 <- gbtr %>% mutate(pd=ifelse(iyear<2000,"before 2000", "after 2000")) %>%
mutate(pd = factor(pd, levels = c("before 2000", "after 2000")))%>%
group_by(region_txt, pd) %>% count() %>%
ggplot(aes(x=reorder(region_txt, n), y=n))+
geom_bar(aes( fill=pd), stat= "identity", position = "dodge")+
labs(title = "Events by region", x = "region", y = "count", fill = "period")+
theme_economist()+
scale_fill_manual(values = c("#66b2b2","#006666")) +
coord_flip()
p2
The region with the most terrorist attack bacame “Middle East & North Africa” after 2000. (“South America” before 2000).
“South Asia” saw the largest increase in terrorism since the 70s.
pkr <- gbtr2k %>% filter(!is.na(nkill)) %>% group_by(region_txt) %>%
summarise(ksum=sum(nkill)) %>%
ggplot(aes(reorder(region_txt,ksum), ksum))+
geom_bar(stat = "identity", fill="#2E8B57")+
coord_flip()+
labs(title = "Num. of kills by region", subtitle = "without missing values, after 2000", x="region", y="count")+
theme_economist()
## `summarise()` ungrouping output (override with `.groups` argument)
per <- gbtr2k %>% group_by(region_txt) %>% count() %>% top_n(10,n) %>%
ggplot(aes(x=reorder(region_txt, n), y=n))+
geom_bar(stat= "identity", fill="#006666")+
labs(title = "Events by region",subtitle = "after 2000", x = "region", y = "count")+
theme_economist()+
coord_flip()
grid.arrange(pkr,per,ncol=2)
We’ll look at data after year 2000
pec <- gbtr2k %>% group_by(country_txt) %>% count() %>% ungroup() %>%
top_n(n=20,wt = n) %>%
ggplot(aes(reorder(country_txt, n), n))+
geom_bar(stat = "identity", fill="#21618C") +
labs(title = "Event by country", subtitle = "after 2000", x = "Country", y = "Count") +
theme_economist() +
scale_fill_manual(values = wes_palette(n=4,"Cavalcanti1"))+
coord_flip()
pec
dtscd <- gbtr2k %>% filter(!is.na(suicide)) %>% group_by(region_txt, suicide) %>% count() %>%
ungroup() %>% group_by(region_txt) %>% mutate(pct=n/sum(n)) %>% filter(suicide==1)
ggplot(dtscd, aes(reorder(region_txt, pct), pct*100)) +
geom_bar(stat = "identity", fill="#5D6D7E")+
coord_flip()+
labs(title = "Pct of suicide attack by region", subtitle = "after 2000", x="region",y="%")+
theme_economist()
gbtr %>%filter(gname!="Unknown") %>% group_by(gname,suicide) %>% summarise(n=n()) %>%
ungroup() %>% group_by(gname) %>% mutate(sum=sum(n)) %>% ungroup() %>% top_n(30,sum) %>%
ggplot(aes(x=reorder(gname,sum),n, fill=factor(suicide, levels = c(1,0)))) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Groups and attacks", x="groups", y="attacks", fill="suicide") +
theme_economist_white() +
scale_fill_manual(values = wes_palette(n=2, "Cavalcanti1"))
## `summarise()` regrouping output by 'gname' (override with `.groups` argument)
Disregarding the “Unknown” groups
wp <- dt %>% select(1,2,3,4,9,11,13,14,15,27,28,30,59,83,85,99,102,117)
wp$imonth[wp$imonth==0] <- NA
wp$iday[wp$iday==0] <- NA
patkrg<- wp %>% group_by(region_txt, attacktype1_txt) %>% count() %>%
ggplot(aes(region_txt, n, fill=attacktype1_txt)) +
geom_bar(stat = "identity",position = "stack")+
scale_fill_manual(values = wes_palette("Darjeeling1" ,n=9, type="continuous"))+
theme_economist()+
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 0.8))+
labs(title = "Attack type by region",
x="region", y="num.", fill="attack type")
patkrg2<- wp %>% group_by(region_txt, attacktype1_txt) %>% count() %>%
ggplot(aes(region_txt, n, fill=attacktype1_txt)) +
geom_bar(stat = "identity",position = "fill")+
scale_fill_manual(values = wes_palette("Darjeeling1" ,n=9, type="continuous"))+
theme_economist()+
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 0.8))+
labs(title = "Attack type by region",
x="region", y="num.", fill="attack type")
patkrg
patkrg2
Different groups might prefer different types of attack method. There are 3537 groups in the data. We’ll look at the groups with the most attacks.
wp %>% filter(gname %in% grp$gname)%>%
group_by(gname, attacktype1_txt) %>% count() %>%
ggplot(aes(gname, n, fill= attacktype1_txt))+
geom_bar(stat = "identity",position = "stack")+
scale_fill_manual(values = wes_palette("Darjeeling1",n=9, type="continuous"))+
theme_economist()+
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 5))+
labs(title = "Attack type by groups", subtitle = "Groups with the most attacks",
x="groups", y="pct", fill="attack type")
wp %>% filter(attacktype1_txt=="Bombing/Explosion" & gname %in% grp$gname ) %>%
group_by(gname, suicide) %>% count() %>% ungroup() %>% group_by(gname) %>% mutate(pct=n/sum(n)) %>% filter(suicide==1) %>% arrange(desc(pct))
## # A tibble: 6 x 4
## # Groups: gname [6]
## gname suicide n pct
## <chr> <int> <int> <dbl>
## 1 Boko Haram 1 435 0.520
## 2 Islamic State of Iraq and the Levant (ISIL) 1 1258 0.342
## 3 Taliban 1 646 0.225
## 4 Al-Shabaab 1 149 0.122
## 5 Kurdistan Workers' Party (PKK) 1 26 0.0355
## 6 Revolutionary Armed Forces of Colombia (FARC) 1 2 0.00216
wp %>% filter(!is.na(nkill)&attacktype1_txt!="Unknown") %>%
group_by(region_txt,attacktype1_txt) %>%
summarise(sumk=sum(nkill), event=n(), kperattack=sum(nkill)/n()) %>%
ggplot(aes(reorder(attacktype1_txt, kperattack), kperattack))+
geom_bar(aes(fill=region_txt), stat = "identity")+
coord_flip()+
facet_wrap(.~ region_txt, ncol = 4, scales = "free_x")+
labs(title = "num. of death by attack type and region", x="attack type", y="death per event")+
scale_fill_manual(values = wes_palette("Darjeeling1", n=12, type = "continuous"))+
theme(legend.position = "none")
## `summarise()` regrouping output by 'region_txt' (override with `.groups` argument)
Types of attack that cause the most death/attack is drastically different from region to region.
Bombing (to my surprise) isn’t responsible for the most death/attack. Instead it’s armed assault and hostage taking in most region.
Hostage taking has the most death/attack in East Asia, Eastern Europe, Middle East & North Africa, South Asia, Southeast Asia, Sub-Saharan Africa and Western Europe.
North America’s extreme data reflects 9/11 attacks on 2001, with nearly 3,000 recorded deaths in 4 attacks.